Open In Colab

Determining likelihood of Exoplanet discovery under different Astronomical conditions

Group Name: Group 40

Name(s) & ID(s) of Group Members: Adrian Rebellato (S3889401), Rafat Mahiuddin (s3897093), Arthul George (S3918048)

Contents


Introduction

Humanity has discovered thousands of planets outside of our solar system, using a variety of methods. What most of these methods have in common, is the detection of an anomaly in a signal from a visible star. This could be an exoplanet passing in front of the star from our perspective, lowering its brightness, or a wobble imparted by the mass of a planet as both the planet and the star pull at each other. We suspect that the methods used to discover planets, would impart a bias on the way which we discover these bodies, and we hope to explore this link.

Dataset Source

This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program (1)

The data is sourced from multiple missions which had/have the goal of discovering exo-planets. Some missions include: TESS, Kepler, K2, KELT and UKIRT.

Dataset Details

The dataset is a subset of information provided by NASA about all discovered Exo-planets. An exoplanet, or extra-solar planet, is any planet sized body outside of our solar system. Such planets may either orbit around another star, or be star-less. (2)

NASA's archive provides many features, which we limited to 28. After further filtering features which are not applicable to our study, we have 18 features, and 4521 entries.

Dataset Variables

The features in our dataset are described in the table below.

Response Variables

Our response variable is pl_rade (planet_radius). This is the size of the exoplanet discovered. Our study will focus on the discoverability of planets, and we suspect that the radius of the planet will significantly affect how easy it is to be discovered. Initially we attempted to use sy_pnum as our target variable, but being a nominal categorical variable, this would not have been suited for a linear regression model.

Choosing this as our response variable will allow us to investigate which features influence the size of a discovered exoplanet. As the methods used to discover exoplanets benefit from larger planets, this will provide us with insight into the effectiveness of these methods.

Goals and Objectives

Our goal is to discover what features influence the likelihood of exoplanet discovery. We have to make a few assumptions to do this effectively using linear regression. The most fundamental assumption is that the distribution of exoplanets in our galaxy is not correlated with their distance from our solar system. This is an important assumption to make, as it allows us to infer that trends in the distribution of exoplanets, are due to undiscovered exoplanets.

We will also assume that planet radius is correlated with ease of discovery. We can safely assume this, as planet radius is strongly related to planet mass, and both of these are critical for all popular exoplanet discovery methods. We also will confirm that the majority of our data comes from the Keplar mission, which utilizes the transit method. (8) This method discovers exoplanets as they pass in front of distant stars. Larger planets cause a more significant drop in brightness. Which increases the distance that exoplanets can be discovered.

Our main objective is to predict the size of discovered exoplanets based on features of the solar system. By predicting the size, we are actually determining what features limit the discoverability of planets smaller than the predicted size. Our secondary objective is to perform exploratory data analysis using basic visualization, to further understand the features of our dataset. This Phase 1 report details this exploration.

Import Dataset


Data Cleaning and Preprocessing

Create Data Branch

Copy data from main into a working dataframe. This is done to prevent reimporting the entire data from source repeatedly. Note: we have not reduced the precision of the data to 3 by using "df = df.style.set_precision(3)" due to the nature of some of our float variables.

Removing Unnecessary Columns

The following columns has been identified to be either useless towards our analysis of the target feature or is not suitable for machine learning.

Variables to drop:

Renaming Columns

Calculate Additional Columns

For greater data visualisation of star systems, additional columns must be calculated. These columns include the following:

The above shall be calcullated in the following cells.

Data Properties

The folllowing code analyses some important properties of the modified data before any other processing may be completed.

Outliers

Using the standard 1.5* outlier check, systems with either planets > 4, or star > 2 will be considered to be outliers.

We know that these are reasonable data points therefore removing them will not be helpful for our study. Because of this we tested a 3.0 * outlier check which is common for astronomical data. (3)

Further investigation into our dataset revealed the dominance of the data from the Kepler mission. Unfortunately, even the 3 * IQR check removes most results that were not supplied by the Kepler mission. Using this check would significantly distort some relationships in our dataset.

The code below is our outlier check for 3.0 * IQR. We have chosen not to implement the check due to reasons mentioned previously and for the sake of visualisation. This might change for Phase 2.

Processing Rows

Dropping NaN values

Append Target Feature

Append target feature to the end of the table.


Data Exploration and Visualisation

Univariate Visualisation

Popularity of Star Count for Discovered Planets

This graph describes that the majority of star systems contains a single star. While it is rare, our dataset does contain systems with 2 stars, 3 stars and a single system with 4 stars.

Popularity of already discovered planets in the same system

This chart depicts the number of systems and their respective planet count. Our data is comprised predominately of single planet systems, with a maximum planet count of 8. There is only a single system of 7 planets, and a single with 8.

The above graphs are single value density plots used to describe the distribution of certain features in our dataset.

Most of the above follow a normal skewed normal distribution. However, the first plot depicting planet radius conatains two peaks. We suspect that the first peak at about 2.5 due to the Keplar mission, which had the goal of discovering Earth-like planets. Further investigation (See below) revealead that Keplar was responsible for the majority of our data points. This may explain how extreme the peak is, in comparison to the peak at 13.

The right skewed nature of the 'planets discovered over the distance property', supports our assumption that exo-planets are harder to discover as distance increases.

Double Variable Visualistion

Discovered planets against the number of stars of the same system

The above graph demonstrates the likelihood of multiple planet discoverabity depending upon the amount of stars in a system.

Tri and Binary Star Systems are protrayed to yeild a greater rate of planet discovery within the same system when compared against singular and quad systems. This may be due to a few reasons:

  1. Methods such as "the so-called radial velocity method to detect the periodic red and blue-shifting of starlight as the star and planet orbit around their common center of mass", commonly strive when multiple light sources (stars) are present. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4253839/)
  2. As noted by The Astronomical Journal, observations of multi-star systems are often more focused and prolonged compared to singular systems. Research into such systems are often based off the stars themselves rather than their planets. Consequently, equipment may be exposed to multi-stared systems for longer periods at a time, resulting in a higher probability of an orbiting planet wandering in front of the sun. (https://iopscience.iop.org/article/10.3847/1538-3881/ab4137)
  3. The graph denotes a noticeably smaller number of planets discovered for quad star systems (double binary systems). This may be a result of sampling bias. Due to their extreme rarity and distance away from earth, they are often harder to discover and observe. Although being noted by NASA as being more common than once thought, we have managed to find two planets in a quad system as of 2015. (https://exoplanets.nasa.gov/news/185/four-fathers-new-exoplanet-discovery-part-of-a-quadruple-star-system/)

Planet discoveries based upon their orbital period and orbital radius

This plot depicts a strong correlation between the orbital period of an exo-planet and it's semi-major axis. The semi-major axis is a good estimate of the distance that the planet orbits around it's star. it's only an estimate as orbits may not be perfectly circular.

This relationship makes sense, as the orbital period and orbital distance are approximated well using Keplar's orbital formula. (5) Any datapoint which does not follow the strong corellation in this graph, interestingly, many of the exo-planets with short orbital periods (Bottom left of the graph) fall below the otherwise straight line. To further understand this relationship, the datapoints were coloured based on the exo-planet's radius. All of the datapoints which fall below the otherwise straight line happen to be smaller planets. While it is out of the scope of this study, we suspect that this anomoly could be caused by the affects of General Relativity, similarly to how our neighbouring planet mercury's orbit, differs from Kelpar's formula due to these affects. (6)

Figure 1 shows a weak, but growing relationship between planetary mass and its orbital period influencing its discovery. However, the graph strongly favours a central hotspot, indicating that most planets discovered tend to have similar mass and orbital properties to that of earth.

Figure 2 demonstrates a weak, but growing relationship between planetary mass and its radial orbiting distance as factors influencing its discovery. This graph draws significant parallels to figure 1, and features a similar central hotspot indicating earth like patterns.

Planetary mass, radial orbiting distance, and orbital period are key factors identified to both play an important role in planet discovery. As visualised in figure 1 and figure 2, some important patterns were noted:

Planet radius and their proximity to Earth

Mass to radius relation for discovered planets

Planet count per system over Distance

This plot exemplifies a strong relationship between the the number of planets discovered per system, and the distance that system is from Earth, or our Sun. (Earth's average position in space is our Sun's position)

The first column of this graph shows almost 50% of systems having 2 or more exo-planets present. However, after 2500 parsecs, no system has more than 1 exo-planet discovered within it. This downwards trend seems to begin at 1000 parsecs. Note: Some discoveries exist beyond 3000 parsecs, but they are all 1 exo-planet systems.

The distance figure represents and average distance from earth at the time of discovery, and a possible explanation of the reduced number of planets in more distant systems is an increase in dificulty for discovery.

This plot compares planet radius to distance. One notable feature of this plot is the slight upwards trend in the lower bound of the y axis. Whilst we have not explored this completely, this could be attributed to the difficulty of finding small planets as distance increases.

This is a closer view of the lower bounds of the previous graph. a radius of 1.0 is the radius of Earth. So from 2000 parsecs and beyond, no exoplanets smaller than Earth have been discovered.

Three Variable Visualisation

Number of planets dependant of varying parallax in relation to its distance from our sun.

This graph contains a strong relationship between the Distance of an exo-planet from our system, the amount of parallax we can see from Earth, and the number of planets in each exo-planet's system.

Parallax is the phenomenon of an object in space moving accross our night sky in relation to the other visible stars. Many of the "stars" we see in the night sky are actually entire galaxies, which are many orders of magnitude further away than stars in the milky way, which in turn are further away than planets in our own system.

This plot shows that the amount of parallax we experience is inversely proportional to the distance the object is from us.

Exo-planet projected location in relation to the number of planets discovered in its system

This graph is a simple projection of the galactic coordinates of each discovered exo-planet. In addition to the galactic latitude, and longitude, we have altered the size of each datapoint based on the number of exo-planet's in that system.

Initially we suspected that the large cluster of exo-planets visible left of centre was the galactic centre. This was a logical conclusion, however, the coordinates do not match up.

Instead we concluded that one of the exo-planet surveying missions that contributed to our dataset focussed on a single location. This large spot makes up the majority of our datapoints. We have determined that this is the data from the Keplar mission. The next graph focuses on this data alone.

This data accurately portrays the results from the Keplar mission. Each square visible is a sample taken by the orbital telescope. Keplar alone discovered over 3000 exo-planets. See Fig 1 from this source for another depiction of this data. (inlcuding stars where no exo-planets were found)

Source: https://www.aanda.org/articles/aa/full_html/2020/03/aa36692-19/aa36692-19.html (7)

This graph is a map-like representation of the position of all exoplanets in relation to Earth. The position is based on the galactic longitude, where 0 degrees is facing the galactic center, the distance from the center is proportional to the distance from our solar system. Earth is the small white dot in the center of the graph. Earth, and all discovered exoplanets are proportional to each other, meaning that this graph is an accurate visualization of how scale affects exoplanet discovery over distance.

The second plot is identical to the first but restricted to 2000 parsecs, to get a clearer view closer to Earth. It is obvious that exoplanet discovery decreases with distance away from our solar-system, otherwise we would see a uniform distribution across this whole graph.

Literature Review

From the Mesopotamians to the ancient Greeks, the study of stars originated centuries ago. In the modern age, we now look at stars to study their properties, the system they orbit in, and to get a grasp of the universe we live in. However, after the moon landing, astronomers have become more excited about entertaining the possibility of life beyond our solar system. This has prompted many research institutions to go on the hunt for orbiting exoplanets outside of our solar system.

Commonly, the method of exoplanet discovery involves the transit method. This involves looking in at a star and waiting for a dip in its brightness. If this dip in brightness occurs repeatedly over a certain period of time, then an orbiting exoplanet exists and is causing a dimming of the star as it passes in front of it (21). Other methods involves “monitoring the spectrum of a star for the tell-tale signs of a planet pulling on its star and causing its light to subtly Doppler shift.”(https://exoplanets.nasa.gov/what-is-an-exoplanet/overview/) This is commonly referred to as the wobble method.

Research into exo-planets and their properties have been quite vast, however, these studies have only been conducted by a select few agencies who possess the instruments to carry out such experiments. Although research into exoplanets was common, studies into the likelihood of exoplanet discovery based under different astronomical conditions have been quite rare, with hardly any studies relating to our research goal. We will therefore reference similar research that has already been done surrounding system formation, planet discovery and its astronomical properties.

Even though our dataset uses a combination of multiple methods into exo-planet discovery, “A coronagraphic or starshade-based direct imaging mission is the only path currently identified to characterize Earth-size planets in the habitable zones of a large sample of nearby Sun-like stars in reflected light”, was the only method used in a 2018 study (20)(18). Furthermore, this may lead to a bias in the type of planets discovered, in particular its size and mass may lean towards more earth-sized planets (10) (12).

In 2014, observations from the OSIRIS instruments provided astronomers with valuable data on exoplanets. The telescope's sensors produce transit waves that allow the discoverability of exoplanet atmospherics using a function of wavelengths(19). Changes within the function are small due to high altitude clouds that are present on these exoplanets. In order to create meaning out of the data, spectroscopic time series plots are required. Initially acquiring this data from telescopes was the best way to acquire quality data (19)(11)(12). However due to advancements in technology, similar precision data can be obtained from ground telescopes.

Chemical evidence of sun-like stars such as inhomogeneous star system binaries may also be used to identify how planets are formed through protostellar gas clouds (9). While some theories research into how solar systems, and resultantly, exoplanets are formed through the daughter nuclei of short - lived radionuclides in calcium (10)(13). Identifying habitable planets in other systems is often very difficult using ground based telescopes, as noted in this alpha Centauri (12) and system formation study (15). This may be due to factors relating to pollution and light-noise, that may mask away the slight dimming and wobble effect of stars. Consequently, our visualisations and dataset takes advantage of satellite based telescopes (for example, Kepler had a high success rate in finding planets (around 3000 total) (22) while TESS accounted for 2200 discoveries (23)). On another note, observing exo-planets with 2 stars is interesting, as their uniquely shaped orbiting patterns may influence its radius (16)(23).

Although astronomers have found thousands of planets, most orbit around stars that will eventually become red giants and then into white dwarfs (14)(17). Because of this, identifying and monitoring these planets becomes increasingly difficult as light intensity from these stars decreases. Therefore our observation into predicting the size of exo-planets discovered under different astronomical conditions will prove useful in positioning telescopes towards systems that will most likely possess exo-planets. In addition, our study intends to predict other planetary attributes as a relation to its size.


Summary & Conclusions

The possibilities that exoplanets may bring for future generations are endless. Up until 1992, they were presumed to exist, but none had been confirmed. Recent missions, in particular Kepler, have brought to light how common they really are. Our reliance on physical instruments has limitations, and with phase 2 of this study, we will investigate the gaps in the data that we have presented previously.

Phase 1 required us to process our data and approach the standard required for multiple linear regression modeling. While there is more to go (such as normalizing the data), the steps we took have allowed us to explore many significant relationships with our data. We removed unnecessary columns, calculated new columns using data inferred from previous features, found outliers and dropped rows with missing values. We are left with almost 3000 rows, with no missing or unusual data.

We made two assumptions at the beginning of the study: That the distribution of exoplanets is independent of their distance from Earth, and that the radius of a planet is correlated with ease of discoverability. Our exploration suggests that the latter is true. We can also assume the former is true (until proven otherwise) as it is the simplest, and most agreed upon description of our universe accepted by astrophysicists.

Exploration into the relationships between features revealed a strong link between orbital distance and orbital period, distance from earth and parallax, and planet mass and radius. We also discovered how the number of exoplanets per star system dropped to 1 after 2500 parsecs. This further supports our assumptions. Finally, exploring the positional relationship between exoplanets revealed to us the Kepler mission, and how it dominated our dataset. Without researching the mission itself, and matching patterns between our data and resources available online, we would have made false assumptions about that anomaly. Furthermore, removing outliers without understanding this impact on our dataset would have stripped our dataset of breadth.

Phase 2 of our project will attempt to solidify and quantify these discovered relationships.


References

  1. NASA Exoplanet Archive. (n.d.).

    Retrieved September 27, 2021, from https://exoplanetarchive.ipac.caltech.edu/ "This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program."

  2. Greicius, T. (2018, April 12).

    What in the World is an 'Exoplanet?' Retrieved from https://www.nasa.gov/feature/jpl/what-in-the-world-is-an-exoplanet

  3. Zhang, Y., Luo, A., & Zhao, Y.

    (2004). Outlier detection in astronomical data. Optimizing Scientific Return for Astronomy through Information Technologies. doi:10.1117/12.550998

  4. Aksakalli, V. , Yenice, Z., Kai Wong, Y., Ture, I., Malekipirbazari, M.

    Feature Selection and Ranking in Machine Learning http://www.featureranking.com/

  5. Orbit Formula - an Overview

    ScienceDirect Topics. Retrieved October 2, 2021 (https://www.sciencedirect.com/topics/engineering/orbit-formula).

  6. Nobili, Anna M. and Clifford M. Will.
    1. “The Real Value of Mercury's Perihelion Advance.” Nature 320(6057):39–41.
  7. Maliuk, A. and J. Budaj. (2020).

    “Spatial Distribution of Exoplanet Candidates Based on Kepler and Gaia Data.” Astronomy & Astrophysics 635.

  8. Johnson, Michele. (2015).

    “Keplar Mission Overview.” NASA. Retrieved October 2, 2021 (https://www.nasa.gov/mission_pages/kepler/overview/index.html).

    1. Anon. 2015.

      “Four-Fathers: New Exoplanet Discovery Part of a Quadruple-Star System.” NASA. Retrieved October 3, 2021 (https://exoplanets.nasa.gov/news/185/four-fathers-new-exoplanet-discovery-part-of-a-quadruple-star-system/).

    2. Tokivinin, A., Everett, M. E., Horch, E. P., Torres, G., Latham, D. W. (2019)

      “Speckle Observations and Orbits of ... - Iopscience.iop.org.” Retrieved October 3, 2021 (https://iopscience.iop.org/article/10.3847/1538-3881/ab4137/pdf).

    3. Turnbull, Margaret C. 2014.

      “Finding Planets and Life among the Stars.” EMBO Reports 15(10):1002–9.

Journals

Journals:

  1. Spina, L. (2021, August 30).

    Chemical evidence for planetary ingestion in a. . . Nature Astronomy. https://www.nature.com/articles/s41550-021-01451-8?error=cookies_not_supported&code=1244f8e8-32e9-493d-a1e6-21a6d1727509

  2. Forbes, J. C. (2021, August 16).

    A Solar System formation analogue in the. . . Nature Astronomy. https://www.nature.com/articles/s41550-021-01442-9?error=cookies_not_supported&code=b7c6226d-c54e-4b18-a32f-bc1d46eeff7f

  3. Zhang, Y. (2021, August 13).

    Isotopes in an atmosphere. Nature. https://www.nature.com/articles/s41586-021-03616-x?error=cookies_not_supported&code=b3156b54-5f8c-4640-908f-438edfef6e73

  4. Wagner, K. (2021, March 4).

    NEARing a direct detection of terrestrial worlds? Nature Communications. https://www.nature.com/articles/s41467-021-21176-6?error=cookies_not_supported&code=0dcdac04-c502-4ea0-8eec-fd45f1febe3a

  5. Dumusque, X. (2012, October 17).

    Meet our closest neighbour. Nature. https://www.nature.com/articles/nature11572?error=cookies_not_supported&code=a4dc7868-32f0-41dc-afd8-f0995f06c703

  6. Vanderburg, A. (2020, September 16).

    Planet discovered transiting a dead star. Nature. https://www.nature.com/articles/s41586-020-2713-y?error=cookies_not_supported&code=21ed3491-04c8-46c1-8418-2749cc464ae5

  7. Forbes, J. C. (2021b, August 16).

    A Solar System formation analogue in the. . . Nature Astronomy. https://www.nature.com/articles/s41550-021-01442-9?error=cookies_not_supported&code=8d13a6b3-ad1d-4547-908f-3e64c96a13e6

  8. Popp, M. (2017, April 6).

    Climate variations on Earth-like circumbinary. . . Nature Communications. https://www.nature.com/articles/ncomms14957?error=cookies_not_supported&code=7665ce2c-bf8b-4817-b202-706830e5a5b6

  9. David, T. J. (2016, June 20).

    A Neptune-sized transiting planet closely. . . Nature. https://www.nature.com/articles/nature18293?error=cookies_not_supported&code=e0a9d03e-6b0b-4066-af2e-e74a99a9de86

  10. Barclay, T. (2013, February 20).

    A sub-Mercury-sized exoplanet. Nature. https://www.nature.com/articles/nature11914?error=cookies_not_supported&code=bec19f46-fd56-4ed9-8a87-48292bff33d5

  11. Barclay, T. (2013b, February 20).

    A sub-Mercury-sized exoplanet. Nature. https://www.nature.com/articles/nature11914?error=cookies_not_supported&code=050ba3ba-ec21-470c-b873-ab84dc5bf482

  12. Arbesman, S. (2010, October 4).

    A Scientometric Prediction of the Discovery of the First Potentially Habitable Planet with a Mass Similar to Earth. Nature. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0013061 Conference Papers:

Conferance Papers

  1. Astronomy & Astrophysics (A&A). (n.d.).

    RMIT Library. Retrieved October 3, 2021, from https://www.aanda.org/articles/aa/pdf/2016/01/aa26313-15.pdf

  2. Magrin, DM. (2019).

    PLATO: the ESA mission for exoplanets discovery: (Vol 10698). PROCEEDINGS OF SPIED. https://www.cosmos.esa.int/documents/343127/343227/106984X.pdf/3581ec9b-d8fb-48c5-7f75-8da5b3d31015

  3. Huag, CXH. (2018).

    TESS DISCOVERY OF A TRANSITING SUPER-EARTH IN THE π MENSAE SYSTEM. https://arxiv.org/pdf/1809.05967.pdf